NOTE: All files are imported using the rules defined below. All files with the exception of Staden files (which can use non-standard nucleotide characters), are filtered to accept only legal nucleotide characters as defined by international standards (IUB table is shown in Nucleotide Character Table help). Lower case letters are converted to upper case and U’s are converted into T’s. Any non-nucleotide characters are discarded. This includes all numbers, RETURNS, TABS, and line feeds.
DNA Inspector: Standard file format generated by DNA Inspector IIe or DNA Inspector III. These file formats are listed in the corresponding DNA Inspector manuals.
EMBL: Starting at the first line of the file, every line is considered a comment including the line that starts with SQ. The line immediately after the SQ line starts the sequence. Sequence information continues until //.
GCG: Everything is comment until two adjacent periods. Everything after the periods is considered sequence.
GenBank: Everything is put into comments up to and including the line starting with ORIGIN. Any text after this line is considered sequence.
Intelligenetics: All lines beginning with a semicolon are considered comments. The first line without a semicolon is the sequence ID (name). Everything following this line is considered sequence.
NBRF: The first line is the header line and the second line is the title; both these lines are put into comments. The sequence follows starting on the third line and continues until an asterisk is reached. Everything following the asterisk is comments.
Pearson: Any line starting with > (greater than) or ; (semicolon) is a comment line. Lines starting with < indicate the start of a new sequence. Pearson files can have multiple sequences in the same file, but the Gene Construction Kit will only import the first one. Sequence lines are those that do not start with > or ;.
Staden: Sequence and comments are mixed. All comments start with < (less than) and end with > (greater than). Everything else is considered sequence. The Gene Construction Kit expects Staden sequence files to start with a comment.
Text: Everything in a TEXT file is considered sequence. All non-nucleotide characters are removed, lower case letters are converted to upper case letters, and U’s are converted to T’s.